Model-based Adversarial Imitation Learning

نویسندگان

  • Nir Baram
  • Oron Anschel
  • Shie Mannor
چکیده

Generative adversarial learning is a popular new approach to training generative models which has been proven successful for other related problems as well. The general idea is to maintain an oracle D that discriminates between the expert’s data distribution and that of the generative model G. The generative model is trained to capture the expert’s distribution by maximizing the probability of D misclassifying the data it generates. Overall, the system is differentiable end-toend and is trained using basic backpropagation. This type of learning was successfully applied to the problem of policy imitation in a model-free setup. However, a model-free approach does not allow the system to be differentiable, which requires the use of high-variance gradient estimations. In this paper we introduce the Model based Adversarial Imitation Learning (MAIL) algorithm. A model-based approach for the problem of adversarial imitation learning. We show how to use a forward model to make the system fully differentiable, which enables us to train policies using the (stochastic) gradient of D. Moreover, our approach requires relatively few environment interactions, and fewer hyper-parameters to tune. We test our method on the MuJoCo physics simulator and report initial results that surpass the current state-of-the-art.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generative Adversarial Imitation Learning

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert’s cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a...

متن کامل

End-to-End Differentiable Adversarial Imitation Learning

Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup. However, the computation graph of GANs, that include a stochastic policy as the generative model, is no longer differentiable end-to-end, which requires the use of high-variance gradient estimation. In this paper, we introduce the Modelbased Generative Adversarial Imit...

متن کامل

Energy-Based Sequence GANs for Recommendation and Their Connection to Imitation Learning

Recommender systems aim to find an accurate and efficient mapping from historic data of user-preferred items to a new item that is to be liked by a user. Towards this goal, energy-based sequence generative adversarial nets (EB-SeqGANs) are adopted for recommendation by learning a generative model for the time series of user-preferred items. By recasting the energy function as the feature functi...

متن کامل

Learning a Visual State Representation for Generative Adversarial Imitation Learning

Imitation learning is a branch of reinforcement learning that aims to train an agent to imitate an expert’s behaviour, with no explicit reward signal or knowledge of the world. Generative Adversarial Imitation Learning (GAIL) is a recent model that performs this very well, in a data-efficient manner. However, it has only been used with low-level, low-dimensional state information, with few resu...

متن کامل

Socially-compliant Navigation through Raw Depth Inputs with Generative Adversarial Imitation Learning

We present an approach for mobile robots to learn to navigate in pedestrian-rich environments via raw depth inputs, in a social-compliant manner. To achieve this, we adopt a generative adversarial imitation learning (GAIL) strategy for motion planning, which improves upon a supervised policy model pre-trained via behavior cloning. Our approach overcomes the disadvantages of previous methods, as...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1612.02179  شماره 

صفحات  -

تاریخ انتشار 2016